# GeneOH Diffusion: Towards Generalizable Hand-Object Interaction Denoising via Denoising Diffusion

The code is adapted from [Human Motion Diffusion](https://guytevet.github.io/mdm-page/).

## Key Implementations

### GeneOH

The implementation of the representation calculation can be found at the function `__getitem__` of each dataloder contained in `./data_loaders/humanml/data/dataset_ours.py` and `./data_loaders/humanml/data/dataset_ours_single_seq.py`.

The returned data contains what we need for GeneOH (take the `rt_dict` of the `__getitem__` function of class `GRAB_Dataset_V19` as an example):

```python
rt_dict = {
          'base_pts': base_pts, # generalized contact points 
          'base_normals': base_normals, # generalized contact point normals 
  				'obj_verts': obj_verts, # additionally returned object verts 
          'obj_normals': obj_normals, # additionally returned object normals
          'obj_faces': obj_faces, # additionally returned object faces
          'obj_rot': object_global_orient_mtx_th, # additionally returned object rotations
          'obj_transl': object_trcansl_th, # additionally returned object translations
          'rel_base_pts_to_rhand_joints': rel_base_pts_to_rhand_joints, # relative hand-object point pair offsets (canonicalized)
          'rhand_joints': rhand_joints, # canonicalized hand trajectory
          'rhand_verts': rhand_verts, # canonicalized hand vertices
          'rhand_transl': rhand_transl_var, # additionally returned MANO parameter
          'rhand_rot': rhand_global_orient_var,  # additionally returned MANO parameter
          'rhand_theta': rhand_pose_var,  # additionally returned MANO parameter
          'rhand_betas': rhand_beta_var,  # additionally returned MANO parameter
          'per_frame_avg_disp_along_normals': per_frame_avg_disp_along_normals, # Mean of e_{k, \perp}^{ho}
          'per_frame_std_disp_along_normals': per_frame_std_disp_along_normals, # Std of e_{k, \perp}^{ho}
          'per_frame_avg_disp_vt_normals': per_frame_avg_disp_vt_normals, # Mean of e_{k, \parallel}^{ho}
          'per_frame_std_disp_vt_normals': per_frame_std_disp_vt_normals, # Std of e_{k, \parallel}^{ho}
          'e_disp_rel_to_base_along_normals': e_disp_rel_to_base_along_normals, # e_{k, \perp}^{ho}
          'e_disp_rel_to_baes_vt_normals': e_disp_rel_to_baes_vt_normals, # e_{k, \parallel}^{ho}
          'vel_obj_pts_to_hand_pts': vel_obj_pts_to_hand_pts, # v_{k}^{ho}
          'obj_pts_disp': obj_pts_disp # v_k^o
          'dist_base_pts_to_rhand_joints': dist_base_pts_to_rhand_joints, # d_k^{ho} relative hand-object point pair distances
    }
```

### Progressive HOI Denoising

Please refer to `GaussianDiffusionV5` in  `./diffusion/gaussian_diffusion_ours.py`  for the implementations of the denoising model for canonical hand trajectory $\bar{\mathcal{J}}$ and the hand-object spatial relations $\mathcal{S}$.  Hyper-parameters are used to select which representation the model is designed for. 

Please refer to `GaussianDiffusionV4` in  `./diffusion/gaussian_diffusion_ours.py`  for the implementations of the denoising model for hand-object temporal relations $\mathcal{T}$.  

Please refer to `./scripts/train` for training scripts and `./scripts/val` for test scripts. 

## File Structure

### Argument parser

- The argument parsers are contained in file `./utils/parser_util.py`. 

### Data loaders

- Dataloaders for *training* are contained in the file `./data_loaders/humanml/data/dataset_ours.py`.
- Dataloaders for *evaluation* are contained in the file `./data_loaders/humanml/data/dataset_ours_single_seq.py`.

### Models

- Denoising models are contained in `./diffusion/gaussian_diffusion_ours.py` where the class `GaussianDiffusionV4` is for the temporal relation representation $\mathcal{T}$, while the class `GaussianDiffusionV5` is for the canonical hand trajectory $\bar{\mathcal{J}}$ and the hand-object spatial relations $\mathcal{S}$. 
- Basic networks used in the denoising models are in `./model/mdm_ours.py`. 

### Trainer

- The root file is `./train/train_mdm.py`.

### Scripts 

- `./scripts/train` for training scripts
- `./scripts/val` for test scripts

## Environment

```bash
conda env create -f environment.yml
```

This command will create an environment named `geneohdiffusion`.  

## Data Preprocessing

We need hand MANO parameters and object meshes with vertex normals. They are raw information which the GRAB dataset and HOI4D dataset can provide readily. While the object vertex normals are not provided in ARCTIC's object models. We then use the function `compute_vertex_normals` implemented in Open3D to acquire them. 

## Usage

### Training

To train the denoising model for $\bar{\mathcal{J}}$, run 

```bash
bash scripts/train/train_motion_diff.sh
```

where the training dataset is chosen via the `use_arctic` argument. The default value is `False` and the GRAB training set would be used for training. 

To train the denoising model for $\mathcal{S}$, run

```bash
bash scripts/train/train_spatial_diff.sh
```

To train the denoising model for $\mathcal{T}$, run

```bash
bash scripts/train/train_temporal_diff.sh
```

## Test

For the GRAB test set and GRAB (Beta) test set, run

```bash
bash scripts/val/predict_ours_objbase_bundle_arctic_rndseed.sh
```

where the specific stage should be specified in the script. Please refer to the comments in the script file for details. Similarly, the argument `pert_type` controls whether to use the GRAB test set or the GRAB (Beta) test set for evaluation. Please refer to the comments in the script file for instructions. 

For the HOI4D test set, run

```bash
bash scripts/val/predict_ours_objbase_bundle_hoi4d_rndseed.sh
```

where the specific stage should be specified in the script. Please refer to the comments in the script file for details. 

For the ARCTIC test set, run

```bash
bash scripts/val/predict_ours_objbase_bundle_arctic_rndseed.sh
```

where the specific stage should be specified in the script. Please refer to the comments in the script file for details. 

For hand mesh trajectory reconstruction, run

```bash
bash scripts/val/reconstruct_bundle.sh
```

## TODOs

We will release the pre-processed data for each dataset, pre-trained model weights, do further clean-ups, and more documentations in the future. 









